Application of different factor analysis methods

scRNA-seq data is highly dimensional as every analyzed genes corresponds to a new dimension. Thus, dimension reduction methods are necassary to enable the work with the data. This allows not only to visualize the data, but is also the basis for clustering the data.

Note: Running this notebook will take approx. 2h due to the calculation of the Latent Dirichlet Allocation (LDA) and Linear Discriminant Analysis

Loading the necessary libraries

Set random seed

Factor Analysis algorithms

Principal component analysis PCA

Note: PCA calculation via sklearn and scanpy give similar results

Load preprocessed data (see notebook Data_Preprocessing)

Calculate PCA

Plotting

Independent component analysis ICA

Load preprocessed data (see notebook Data_Preprocessing)

Calculate ICA

Plotting

Non-negative matrix factorization NMF

Load raw data

Calculate NMF

Plotting

Latent Dirichlet Allocation (LDA)

Load raw data

Calculate LDA

Plotting

Linear Discriminant Analysis

Load raw data

Calculate Linear Discriminant Analysis

Algorithm needs an array y with given target values for each cell. Here the annotation cytokine.condition was used.

Plotting

Data-driving genes identified by the different Factor Analysis methods

Example Analysis based on PCA Factor Analysis

Change .varm[] and .obsm[] to the variable names chosen for the different factor analysis calculations to exprore the genes in the components identified by the other methods

Plotting

Access genes in the components

Gene Set Enrichment Analysis

Cluster Identification

The by the different factor analysis calculated components can be used for further analysis in scanpy (similar to notebooks 'Exploring_Cell-Types_in_UNS_data.ipynb' and 'Exploring_Cell-Types_in_stimulated_data.ipynb'). For this purpose the calculation has to be stored under the variable name of the PCA, e.g. adata.obsm['X_ica'] as canogamez.obsm['X_pca'] and adata.varm['ICs'] as canogamez.varm['PCs'].